Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.242
Filtrar
1.
BMC Med Inform Decis Mak ; 23(Suppl 4): 299, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38326827

RESUMO

BACKGROUND: In this era of big data, data harmonization is an important step to ensure reproducible, scalable, and collaborative research. Thus, terminology mapping is a necessary step to harmonize heterogeneous data. Take the Medical Dictionary for Regulatory Activities (MedDRA) and International Classification of Diseases (ICD) for example, the mapping between them is essential for drug safety and pharmacovigilance research. Our main objective is to provide a quantitative and qualitative analysis of the mapping status between MedDRA and ICD. We focus on evaluating the current mapping status between MedDRA and ICD through the Unified Medical Language System (UMLS) and Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). We summarized the current mapping statistics and evaluated the quality of the current MedDRA-ICD mapping; for unmapped terms, we used our self-developed algorithm to rank the best possible mapping candidates for additional mapping coverage. RESULTS: The identified MedDRA-ICD mapped pairs cover 27.23% of the overall MedDRA preferred terms (PT). The systematic quality analysis demonstrated that, among the mapped pairs provided by UMLS, only 51.44% are considered an exact match. For the 2400 sampled unmapped terms, 56 of the 2400 MedDRA Preferred Terms (PT) could have exact match terms from ICD. CONCLUSION: Some of the mapped pairs between MedDRA and ICD are not exact matches due to differences in granularity and focus. For 72% of the unmapped PT terms, the identified exact match pairs illustrate the possibility of identifying additional mapped pairs. Referring to its own mapping standard, some of the unmapped terms should qualify for the expansion of MedDRA to ICD mapping in UMLS.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Classificação Internacional de Doenças , Humanos , Unified Medical Language System , Farmacovigilância , Algoritmos
2.
J Biomed Inform ; 149: 104580, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38163514

RESUMO

The complex linguistic structures and specialized terminology of expert-authored content limit the accessibility of biomedical literature to the general public. Automated methods have the potential to render this literature more interpretable to readers with different educational backgrounds. Prior work has framed such lay language generation as a summarization or simplification task. However, adapting biomedical text for the lay public includes the additional and distinct task of background explanation: adding external content in the form of definitions, motivation, or examples to enhance comprehensibility. This task is especially challenging because the source document may not include the required background knowledge. Furthermore, background explanation capabilities have yet to be formally evaluated, and little is known about how best to enhance them. To address this problem, we introduce Retrieval-Augmented Lay Language (RALL) generation, which intuitively fits the need for external knowledge beyond that in expert-authored source documents. In addition, we introduce CELLS, the largest (63k pairs) and broadest-ranging (12 journals) parallel corpus for lay language generation. To evaluate RALL, we augmented state-of-the-art text generation models with information retrieval of either term definitions from the UMLS and Wikipedia, or embeddings of explanations from Wikipedia documents. Of these, embedding-based RALL models improved summary quality and simplicity while maintaining factual correctness, suggesting that Wikipedia is a helpful source for background explanation in this context. We also evaluated the ability of both an open-source Large Language Model (Llama 2) and a closed-source Large Language Model (GPT-4) in background explanation, with and without retrieval augmentation. Results indicate that these LLMs can generate simplified content, but that the summary quality is not ideal. Taken together, this work presents the first comprehensive study of background explanation for lay language generation, paving the path for disseminating scientific knowledge to a broader audience. Our code and data are publicly available at: https://github.com/LinguisticAnomalies/pls_retrieval.


Assuntos
Idioma , Processamento de Linguagem Natural , Armazenamento e Recuperação da Informação , Linguística , Unified Medical Language System
3.
J Am Med Inform Assoc ; 31(2): 426-434, 2024 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-37952122

RESUMO

OBJECTIVE: To construct an exhaustive Complementary and Integrative Health (CIH) Lexicon (CIHLex) to help better represent the often underrepresented physical and psychological CIH approaches in standard terminologies, and to also apply state-of-the-art natural language processing (NLP) techniques to help recognize them in the biomedical literature. MATERIALS AND METHODS: We constructed the CIHLex by integrating various resources, compiling and integrating data from biomedical literature and relevant sources of knowledge. The Lexicon encompasses 724 unique concepts with 885 corresponding unique terms. We matched these concepts to the Unified Medical Language System (UMLS), and we developed and utilized BERT models comparing their efficiency in CIH named entity recognition to well-established models including MetaMap and CLAMP, as well as the large language model GPT3.5-turbo. RESULTS: Of the 724 unique concepts in CIHLex, 27.2% could be matched to at least one term in the UMLS. About 74.9% of the mapped UMLS Concept Unique Identifiers were categorized as "Therapeutic or Preventive Procedure." Among the models applied to CIH named entity recognition, BLUEBERT delivered the highest macro-average F1-score of 0.91, surpassing other models. CONCLUSION: Our CIHLex significantly augments representation of CIH approaches in biomedical literature. Demonstrating the utility of advanced NLP models, BERT notably excelled in CIH entity recognition. These results highlight promising strategies for enhancing standardization and recognition of CIH terminology in biomedical contexts.


Assuntos
Algoritmos , Unified Medical Language System , Processamento de Linguagem Natural , Idioma
4.
Artigo em Inglês | MEDLINE | ID: mdl-38082992

RESUMO

Clinical Practice Guidelines (CPGs) for cancer diseases evolve rapidly due to new evidence generated by active research. Currently, CPGs are primarily published in a document format that is ill-suited for managing this developing knowledge. A knowledge model of the guidelines document suitable for programmatic interaction is required. This work proposes an automated method for extraction of knowledge from National Comprehensive Cancer Network (NCCN) CPGs in Oncology and generating a structured model containing the retrieved knowledge. The proposed method was tested using two versions of NCCN Non-Small Cell Lung Cancer (NSCLC) CPG to demonstrate the effectiveness in faithful extraction and modeling of knowledge. Three enrichment strategies using Cancer staging information, Unified Medical Language System (UMLS) Metathesaurus & National Cancer Institute thesaurus (NCIt) concepts, and Node classification are also presented to enhance the model towards enabling programmatic traversal and querying of cancer care guidelines. The Node classification was performed using a Support Vector Machine (SVM) model, achieving a classification accuracy of 0.81 with 10-fold cross-validation.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Unified Medical Language System , Vocabulário Controlado , Guias de Prática Clínica como Assunto
5.
Artigo em Inglês | MEDLINE | ID: mdl-38083556

RESUMO

Recent advances in Natural Language Processing (NLP) have produced state of the art results on several sequence to sequence (seq2seq) tasks. Enhancements in embedders and their training methodologies have shown significant improvement on downstream tasks. Word vector models like Word2Vec, FastText & Glove were widely used over one-hot encoded vectors for years until the advent of deep contextualized embedders. Protein sequences consist of 20 naturally occurring amino acids that can be treated as the language of nature. These amino acids in combinations with each other makeup the biological functions. The choice of vector representation and architecture design for a biological task is highly dependent upon the nature of the task. We utilize unlabelled protein sequences to train a Convolution and Gated Recurrent Network (CGRN) embedder using Masked Language Modeling (MLM) technique that shows significant performance boost under resource constraint setting on two downstream tasks i.e., F1-score(Q8) of 73.1% on Secondary Structure Prediction (SSP) & F1-score of 84% on Intrinsically Disordered Region Prediction (IDRP). We also compare different architectures on downstream tasks to show the impact of the nature of biological task on the performance of the model.


Assuntos
Idioma , Processamento de Linguagem Natural , Sequência de Aminoácidos , Unified Medical Language System , Aminoácidos
6.
BMC Bioinformatics ; 24(1): 405, 2023 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-37898795

RESUMO

BACKGROUND: Extracting information from free texts using natural language processing (NLP) can save time and reduce the hassle of manually extracting large quantities of data from incredibly complex clinical notes of cancer patients. This study aimed to systematically review studies that used NLP methods to identify cancer concepts from clinical notes automatically. METHODS: PubMed, Scopus, Web of Science, and Embase were searched for English language papers using a combination of the terms concerning "Cancer", "NLP", "Coding", and "Registries" until June 29, 2021. Two reviewers independently assessed the eligibility of papers for inclusion in the review. RESULTS: Most of the software programs used for concept extraction reported were developed by the researchers (n = 7). Rule-based algorithms were the most frequently used algorithms for developing these programs. In most articles, the criteria of accuracy (n = 14) and sensitivity (n = 12) were used to evaluate the algorithms. In addition, Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) and Unified Medical Language System (UMLS) were the most commonly used terminologies to identify concepts. Most studies focused on breast cancer (n = 4, 19%) and lung cancer (n = 4, 19%). CONCLUSION: The use of NLP for extracting the concepts and symptoms of cancer has increased in recent years. The rule-based algorithms are well-liked algorithms by developers. Due to these algorithms' high accuracy and sensitivity in identifying and extracting cancer concepts, we suggested that future studies use these algorithms to extract the concepts of other diseases as well.


Assuntos
Neoplasias da Mama , Processamento de Linguagem Natural , Humanos , Feminino , Algoritmos , Software , Unified Medical Language System
7.
J Med Internet Res ; 25: e45225, 2023 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-37862061

RESUMO

BACKGROUND: The global pandemics of severe acute respiratory syndrome, Middle East respiratory syndrome, and COVID-19 have caused unprecedented crises for public health. Coronaviruses are constantly evolving, and it is unknown which new coronavirus will emerge and when the next coronavirus will sweep across the world. Knowledge graphs are expected to help discover the pathogenicity and transmission mechanism of viruses. OBJECTIVE: The aim of this study was to discover potential targets and candidate drugs to repurpose for coronaviruses through a knowledge graph-based approach. METHODS: We propose a computational and evidence-based knowledge discovery approach to identify potential targets and candidate drugs for coronaviruses from biomedical literature and well-known knowledge bases. To organize the semantic triples extracted automatically from biomedical literature, a semantic conversion model was designed. The literature knowledge was associated and integrated with existing drug and gene knowledge through semantic mapping, and the coronavirus knowledge graph (CovKG) was constructed. We adopted both the knowledge graph embedding model and the semantic reasoning mechanism to discover unrecorded mechanisms of drug action as well as potential targets and drug candidates. Furthermore, we have provided evidence-based support with a scoring and backtracking mechanism. RESULTS: The constructed CovKG contains 17,369,620 triples, of which 641,195 were extracted from biomedical literature, covering 13,065 concept unique identifiers, 209 semantic types, and 97 semantic relations of the Unified Medical Language System. Through multi-source knowledge integration, 475 drugs and 262 targets were mapped to existing knowledge, and 41 new drug mechanisms of action were found by semantic reasoning, which were not recorded in the existing knowledge base. Among the knowledge graph embedding models, TransR outperformed others (mean reciprocal rank=0.2510, Hits@10=0.3505). A total of 33 potential targets and 18 drug candidates were identified for coronaviruses. Among them, 7 novel drugs (ie, quinine, nelfinavir, ivermectin, asunaprevir, tylophorine, Artemisia annua extract, and resveratrol) and 3 highly ranked targets (ie, angiotensin converting enzyme 2, transmembrane serine protease 2, and M protein) were further discussed. CONCLUSIONS: We showed the effectiveness of a knowledge graph-based approach in potential target discovery and drug repurposing for coronaviruses. Our approach can be extended to other viruses or diseases for biomedical knowledge discovery and relevant applications.


Assuntos
COVID-19 , Reposicionamento de Medicamentos , Humanos , Reconhecimento Automatizado de Padrão , Bases de Conhecimento , Unified Medical Language System
8.
IEEE J Biomed Health Inform ; 27(12): 6029-6038, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37703167

RESUMO

Medical entity normalization is an important task for medical information processing. The Unified Medical Language System (UMLS), a well-developed medical terminology system, is crucial for medical entity normalization. However, the UMLS primarily consists of English medical terms. For languages other than English, such as Chinese, a significant challenge for normalizing medical entities is the lack of robust terminology systems. To address this issue, we propose a translation-enhancing training strategy that incorporates the translation and synonym knowledge of the UMLS into a language model using the contrastive learning approach. In this work, we proposed a cross-lingual pre-trained language model called TeaBERT, which can align synonymous Chinese and English medical entities across languages at the concept level. As the evaluation results showed, the TeaBERT language model outperformed previous cross-lingual language models with Acc@5 values of 92.54%, 87.14% and 84.77% on the ICD10-CN, CHPO and RealWorld-v2 datasets, respectively. It also achieved a new state-of-the-art cross-lingual entity mapping performance without fine-tuning. The translation-enhancing strategy is applicable to other languages that face the similar challenge due to the absence of well-developed medical terminology systems.


Assuntos
Idioma , Unified Medical Language System , Classificação Internacional de Doenças , Processamento de Linguagem Natural
9.
J Am Med Inform Assoc ; 30(12): 1895-1903, 2023 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-37615994

RESUMO

OBJECTIVE: Outcomes are important clinical study information. Despite progress in automated extraction of PICO (Population, Intervention, Comparison, and Outcome) entities from PubMed, rarely are these entities encoded by standard terminology to achieve semantic interoperability. This study aims to evaluate the suitability of the Unified Medical Language System (UMLS) and SNOMED-CT in encoding outcome concepts in randomized controlled trial (RCT) abstracts. MATERIALS AND METHODS: We iteratively developed and validated an outcome annotation guideline and manually annotated clinically significant outcome entities in the Results and Conclusions sections of 500 randomly selected RCT abstracts on PubMed. The extracted outcomes were fully, partially, or not mapped to the UMLS via MetaMap based on established heuristics. Manual UMLS browser search was performed for select unmapped outcome entities to further differentiate between UMLS and MetaMap errors. RESULTS: Only 44% of 2617 outcome concepts were fully covered in the UMLS, among which 67% were complex concepts that required the combination of 2 or more UMLS concepts to represent them. SNOMED-CT was present as a source in 61% of the fully mapped outcomes. DISCUSSION: Domains such as Metabolism and Nutrition, and Infections and Infectious Diseases need expanded outcome concept coverage in the UMLS and MetaMap. Future work is warranted to similarly assess the terminology coverage for P, I, C entities. CONCLUSION: Computational representation of clinical outcomes is important for clinical evidence extraction and appraisal and yet faces challenges from the inherent complexity and lack of coverage of these concepts in UMLS and SNOMED-CT, as demonstrated in this study.


Assuntos
Systematized Nomenclature of Medicine , Unified Medical Language System , PubMed , Ensaios Clínicos Controlados Aleatórios como Assunto
10.
J Am Med Inform Assoc ; 30(12): 1887-1894, 2023 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-37528056

RESUMO

OBJECTIVE: Use heuristic, deep learning (DL), and hybrid AI methods to predict semantic group (SG) assignments for new UMLS Metathesaurus atoms, with target accuracy ≥95%. MATERIALS AND METHODS: We used train-test datasets from successive 2020AA-2022AB UMLS Metathesaurus releases. Our heuristic "waterfall" approach employed a sequence of 7 different SG prediction methods. Atoms not qualifying for a method were passed on to the next method. The DL approach generated BioWordVec and SapBERT embeddings for atom names, BioWordVec embeddings for source vocabulary names, and BioWordVec embeddings for atom names of the second-to-top nodes of an atom's source hierarchy. We fed a concatenation of the 4 embeddings into a fully connected multilayer neural network with an output layer of 15 nodes (one for each SG). For both approaches, we developed methods to estimate the probability that their predicted SG for an atom would be correct. Based on these estimations, we developed 2 hybrid SG prediction methods combining the strengths of heuristic and DL methods. RESULTS: The heuristic waterfall approach accurately predicted 94.3% of SGs for 1 563 692 new unseen atoms. The DL accuracy on the same dataset was also 94.3%. The hybrid approaches achieved an average accuracy of 96.5%. CONCLUSION: Our study demonstrated that AI methods can predict SG assignments for new UMLS atoms with sufficient accuracy to be potentially useful as an intermediate step in the time-consuming task of assigning new atoms to UMLS concepts. We showed that for SG prediction, combining heuristic methods and DL methods can produce better results than either alone.


Assuntos
Aprendizado Profundo , Heurística , Semântica , Unified Medical Language System , Redes Neurais de Computação
11.
Sci Rep ; 13(1): 14214, 2023 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-37648800

RESUMO

One of the artificial intelligence applications in the biomedical field is knowledge-intensive question-answering. As domain expertise is particularly crucial in this field, we propose a method for efficiently infusing biomedical knowledge into pretrained language models, ultimately targeting biomedical question-answering. Transferring all semantics of a large knowledge graph into the entire model requires too many parameters, increasing computational cost and time. We investigate an efficient approach that leverages adapters to inject Unified Medical Language System knowledge into pretrained language models, and we question the need to use all semantics in the knowledge graph. This study focuses on strategies of partitioning knowledge graph and either discarding or merging some for more efficient pretraining. According to the results of three biomedical question answering finetuning datasets, the adapters pretrained on semantically partitioned group showed more efficient performance in terms of evaluation metrics, required parameters, and time. The results also show that discarding groups with fewer concepts is a better direction for small datasets, and merging these groups is better for large dataset. Furthermore, the metric results show a slight improvement, demonstrating that the adapter methodology is rather insensitive to the group formulation.


Assuntos
Inteligência Artificial , Unified Medical Language System , Benchmarking , Conhecimento , Idioma
12.
PLoS One ; 18(8): e0281858, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37540684

RESUMO

PURPOSE: To present a classification of inherited retinal diseases (IRDs) and evaluate its content coverage in comparison with common standard terminology systems. METHODS: In this comparative cross-sectional study, a panel of subject matter experts annotated a list of IRDs based on a comprehensive review of the literature. Then, they leveraged clinical terminologies from various reference sets including Unified Medical Language System (UMLS), Online Mendelian Inheritance in Man (OMIM), International Classification of Diseases (ICD-11), Systematized Nomenclature of Medicine (SNOMED-CT) and Orphanet Rare Disease Ontology (ORDO). RESULTS: Initially, we generated a hierarchical classification of 62 IRD diagnosis concepts in six categories. Subsequently, the classification was extended to 164 IRD diagnoses after adding concepts from various standard terminologies. Finally, 158 concepts were selected to be classified into six categories and genetic subtypes of 412 cases were added to the related concepts. UMLS has the greatest content coverage of 90.51% followed respectively by SNOMED-CT (83.54%), ORDO (81.01%), OMIM (60.76%), and ICD-11 (60.13%). There were 53 IRD concepts (33.54%) that were covered by all five investigated systems. However, 2.53% of the IRD concepts in our classification were not covered by any of the standard terminologies. CONCLUSIONS: This comprehensive classification system was established to organize IRD diseases based on phenotypic and genotypic specifications. It could potentially be used for IRD clinical documentation purposes and could also be considered a preliminary step forward to developing a more robust standard ontology for IRDs or updating available standard terminologies. In comparison, the greatest content coverage of our proposed classification was related to the UMLS Metathesaurus.


Assuntos
Doenças Retinianas , Systematized Nomenclature of Medicine , Humanos , Estudos Transversais , Unified Medical Language System , Classificação Internacional de Doenças , Doenças Retinianas/diagnóstico , Doenças Retinianas/genética
13.
J Am Med Inform Assoc ; 30(12): 1954-1964, 2023 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-37550244

RESUMO

OBJECTIVE: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. MATERIALS AND METHODS: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. RESULTS: LeafAI matched a mean 43% of enrolled patients with 27 225 eligible across 8 clinical trials, compared to 27% matched and 14 587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. CONCLUSIONS: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival an experienced human programmer in finding patients eligible for clinical trials.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Humanos , Bases de Conhecimento , Ensaios Clínicos como Assunto
14.
J Biomed Inform ; 143: 104415, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37276949

RESUMO

Disease knowledge graphs have emerged as a powerful tool for artificial intelligence to connect, organize, and access diverse information about diseases. Relations between disease concepts are often distributed across multiple datasets, including unstructured plain text datasets and incomplete disease knowledge graphs. Extracting disease relations from multimodal data sources is thus crucial for constructing accurate and comprehensive disease knowledge graphs. We introduce REMAP, a multimodal approach for disease relation extraction. The REMAP machine learning approach jointly embeds a partial, incomplete knowledge graph and a medical language dataset into a compact latent vector space, aligning the multimodal embeddings for optimal disease relation extraction. Additionally, REMAP utilizes a decoupled model structure to enable inference in single-modal data, which can be applied under missing modality scenarios. We apply the REMAP approach to a disease knowledge graph with 96,913 relations and a text dataset of 1.24 million sentences. On a dataset annotated by human experts, REMAP improves language-based disease relation extraction by 10.0% (accuracy) and 17.2% (F1-score) by fusing disease knowledge graphs with language information. Furthermore, REMAP leverages text information to recommend new relationships in the knowledge graph, outperforming graph-based methods by 8.4% (accuracy) and 10.4% (F1-score). REMAP is a flexible multimodal approach for extracting disease relations by fusing structured knowledge and language information. This approach provides a powerful model to easily find, access, and evaluate relations between disease concepts.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Humanos , Unified Medical Language System , Idioma , Processamento de Linguagem Natural
15.
J Biomed Inform ; 144: 104431, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37385327

RESUMO

In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.


Assuntos
Idioma , Processamento de Linguagem Natural , Humanos , Registros , Itália , Unified Medical Language System
16.
Stud Health Technol Inform ; 305: 186-189, 2023 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-37386992

RESUMO

Clinical search engines development is actual task for medical informatics. The main issue in this area is to implement high-quality unstructured texts processing. Ontological interdisciplinary metathesaurus UMLS can be used to solve this problem. Currently, there is no unified method to relevant information aggregation from UMLS. In this research, we have presented the UMLS as graph model and performed the spot check of UMLS structure to identify basic problems. Then we created and integrated new graph metric in two created by us program modules for relevant knowledge aggregation from UMLS.


Assuntos
Informática Médica , Unified Medical Language System , Estudos Interdisciplinares , Conhecimento , Ferramenta de Busca
17.
BMC Med Inform Decis Mak ; 23(1): 86, 2023 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-37147628

RESUMO

BACKGROUND: Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. METHODS: We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-driven framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets, MIMIC-III discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging reports from two institutions in the US and the UK, with annotations. RESULTS: The improvements in the precision were pronounced (by over 30% to 50% absolute score for Text-to-UMLS linking), with almost no loss of recall compared to the existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. The overall pipeline processing clinical notes can extract rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes). CONCLUSION: The study provides empirical evidence for the task by applying a weakly supervised NLP pipeline on clinical notes. The proposed weak supervised deep learning approach requires no human annotation except for validation and testing, by leveraging ontologies, NER+L tools, and contextual representations. The study also demonstrates that Natural Language Processing (NLP) can complement traditional ICD-based approaches to better estimate rare diseases in clinical notes. We discuss the usefulness and limitations of the weak supervision approach and propose directions for future studies.


Assuntos
Processamento de Linguagem Natural , Doenças Raras , Humanos , Doenças Raras/diagnóstico , Aprendizado de Máquina , Unified Medical Language System , Classificação Internacional de Doenças
18.
Artif Intell Med ; 140: 102551, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37210157

RESUMO

Text-Based Medical Image Retrieval (TBMIR) has been known to be successful in retrieving medical images with textual descriptions. Usually, these descriptions are very brief and cannot express the whole visual content of the image in words, hence negatively affect the retrieval performance. One of the solutions offered in the literature is to form a Bayesian Network thesaurus taking advantage of some medical terms extracted from the image datasets. Despite the interestingness of this solution, it is not efficient as it is highly related to the co-occurrence measure, the layer arrangement and the arc directions. A significant drawback of the co-occurrence measure is the generation of a lot of uninteresting co-occurring terms. Several studies applied the association rules mining and its measures to discover the correlation between the terms. In this paper, we propose a new efficient association Rule Based Bayesian Network (R2BN) model for TBMIR using updated medically-dependent features (MDF) based on Unified Medical Language System (UMLS). The MDF are a set of medical terms that refers to the imaging modalities, the image color, the searched object dimension, etc. The proposed model presents the association rules mined from MDF in the form of Bayesian Network model. Then, it exploits the association rule measures (support, confidence, and lift) to prune the Bayesian Network model for efficient computation. The proposed R2BN model is combined with a literature probabilistic model to predict the relevance of an image to a given query. Experiments are carried out with ImageCLEF medical retrieval task collections from 2009 to 2013. Results show that our proposed model enhances significantly the image retrieval accuracy compared to the state-of-the-art retrieval models.


Assuntos
Armazenamento e Recuperação da Informação , Modelos Estatísticos , Teorema de Bayes , Unified Medical Language System
19.
Stud Health Technol Inform ; 302: 808-812, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203500

RESUMO

Many concepts in the medical literature are named after persons. Frequent ambiguities and spelling varieties, however, complicate the automatic recognition of such eponyms with natural language processing (NLP) tools. Recently developed methods include word vectors and transformer models that incorporate context information into the downstream layers of a neural network architecture. To evaluate these models for classifying medical eponymy, we label eponyms and counterexamples mentioned in a convenience sample of 1,079 Pubmed abstracts, and fit logistic regression models to the vectors from the first (vocabulary) and last (contextualized) layers of a SciBERT language model. According to the area under sensitivity-specificity curves, models based on contextualized vectors achieved a median performance of 98.0% in held-out phrases. This outperformed models based on vocabulary vectors (95.7%) by a median of 2.3 percentage points. When processing unlabeled inputs, such classifiers appeared to generalize to eponyms that did not appear among any annotations. These findings attest to the effectiveness of developing domain-specific NLP functions based on pre-trained language models, and underline the utility of context information for classifying potential eponyms.


Assuntos
Idioma , Redes Neurais de Computação , Processamento de Linguagem Natural , PubMed , Unified Medical Language System
20.
Stud Health Technol Inform ; 302: 823-824, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203506

RESUMO

This paper describes a first attempt to map UMLS concepts to pictographs as a resource for translation systems for the medical domain. An evaluation of pictographs from two freely available sets shows that for many concepts no pictograph could be found and that word-based lookup is inadequate for this task.


Assuntos
Unified Medical Language System
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...